Tea, Earl Grey, Hot: Designing Speech Interactions from the Imagined Ideal of Star Trek


Speech is now common in daily interactions with our devices, thanks to voice user interfaces (VUIs) like Alexa. Despite their seeming ubiquity, designs often do not match users’ expectations. Science fiction, which is known to influence design of new technologies, has included VUIs for decades. Star Trek: The Next Generation is a prime example of how people envisioned ideal VUIs. Understanding how current VUIs live up to Star Trek’s utopian technologies reveals mismatches between current designs and user expectations, as informed by popular fiction. Combining conversational analysis and VUI user analysis, we study voice interactions with the Enterprise’s computer and compare them to current interactions. Independent of futuristic computing power, we find key design-based differences: Star Trek interactions are brief and functional, not conversational, they are highly multimodal and context-driven, and there is often no spoken computer response. From this, we suggest paths to better align VUIs with user expectations.

Source: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-08-17/readme.md

Chart 1: Lauren

Chart 2: Kim

Verbal vs. Non-Verbal Computer Response Proportions per Primary Types of Voice Interactions By Person


When we interact with voice-command technology, we use certain types of interactions to ‘wake’ the system (“Hey Siri…”), ‘command’ the system (“Play a song on Spotify”), ‘question’ the system (“What is the temperature for today?”), and many other types of interactions.

These interaction types can exist in a chain, such as “Hey Siri, Play a song on Spotify”. However, the primary type of interaction in this phrase is the command to have Siri play a song on Spotify.

On the Starship Enterprise, the crew interacts with the Computer through different primary ‘Interaction Types’. Definitions and examples of these interaction types can be found below.

Interaction Type Definition Examples
Command Utterances that directly tell the computer what to do. Run a diagnostic on the port nacelle.
Question Utterances that ask the computer for something. Where is Captain Picard?
Statement Utterances tell don’t tell the computer or ask it, but meaning is inferred. Deck four. I wish to learn about Earth.
Password Utterances that contain a password. This is Captain Picard.
Wake Word Key phrases used to activate the computer. Computer. Holodeck.
Comment Utterances that have no intended action for the computer. Excellent. Ferrazene has a complex molecular structure.
Conversation Utterances that are more like human conversation, such as phatic espressions, formalities, and colloquial speech. Well, check it again! Then run it for us, dear.

Because the Computer on the Starship Enterprise can generate objects and display information without responding, it is of interest to examine the proportion of occurrences when the computer responds verbally or non-verbally (which includes through actions only).

The visualizations to the left shows the proportion of verbal versus non-verbal responses, according to interaction type by person. This information can help us understand what types of interactions are more likely to result in verbal or non-verbal responses from the Starship Enterprise Computer.

From the data visualizations, we can see that Question and Password interactions are most likely to result in a Verbal response from the Computer. Wake Word and Conversation interactions had a low sample size, though all other types of interactions may result in either Verbal or Non-Verbal Computer response fairly equally.

Source: http://www.speechinteraction.org/TNG/TeaEarlGreyHotDatasetCodeBook.pdf

Chart 4: Stacey

Chart 5: Courtney

---
title: "Untitled"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(readr)
library(knitr)
library(tidyverse)
library(purrr)
library(broom)
library(plotly)
startrek <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-17/computer.csv')

```


### Tea, Earl Grey, Hot: Designing Speech Interactions from the Imagined Ideal of Star Trek

```{r}
include_graphics('https://raw.githubusercontent.com/LaurS12/ERHS535_Group_Project/main/Images/data_description.png')

#note: this looks like garbage in the markdown file, but if you knit, it shows up correct.
```

***

Speech is now common in daily interactions with our devices, thanks to voice user interfaces (VUIs) like Alexa. Despite their seeming ubiquity, designs often do not match users’ expectations. Science fiction, which is known to influence design of new technologies, has included VUIs for decades. Star Trek: The Next Generation is a prime example of how people envisioned ideal VUIs. Understanding how current VUIs live up to Star Trek’s utopian technologies reveals mismatches between current designs and user expectations, as informed by popular fiction. Combining conversational analysis and VUI user analysis, we study voice interactions with the Enterprise’s computer and compare them to current interactions. Independent of futuristic computing power, we find key design-based differences: Star Trek interactions are brief and functional, not conversational, they are highly multimodal and context-driven, and there is often no spoken computer response. From this, we suggest paths to better align VUIs with user expectations.

Source: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-08-17/readme.md


### Chart 1: Lauren

```{r}

```

### Chart 2: Kim

```{r}

```

### Verbal vs. Non-Verbal Computer Response Proportions per Primary Types of Voice Interactions By Person

```{r, results='hide'}
no_comp_voice <- startrek %>%
  filter(char != "Computer Voice") %>% 
  filter(char != "Computer") %>% 
  filter(char != "Computer (V.O.)") %>% 
  filter(char != "Computer (V.O)") %>% 
  filter(char != "Computer Voice (V.O.)") %>% 
  filter(char != "New Computer Voice") %>% 
  filter(char != "Com Panel (V.O.)") %>% 
  filter(char != "Computer'S Voice") %>% 
  filter(char != "Computer (Voice)") %>% 
  filter(char != "Computer Voice (Cont'D)")

no_comp_voice <- no_comp_voice %>% 
  select('pri_type', 'nv_resp')

no_comp_voice$nv_resp <- as.factor(no_comp_voice$nv_resp)
no_comp_voice$pri_type <- as.factor(no_comp_voice$pri_type)

no_comp_voice$nv_resp <- no_comp_voice$nv_resp %>% 
  recode_factor("TRUE" = "Non-Verbal Response") %>% 
  recode_factor("FALSE" = "Verbal Response")

levels(no_comp_voice$pri_type)

no_comp_voice <- no_comp_voice %>% 
  group_by(pri_type, nv_resp) %>% 
  tally()

no_comp_voice <- no_comp_voice %>% 
  pivot_wider(names_from = nv_resp, values_from = n)

no_comp_voice[is.na(no_comp_voice)] = 0

no_comp_voice <- no_comp_voice %>% 
  rename(n_verbal = "Verbal Response") %>% 
  rename(n_non_verbal = "Non-Verbal Response")

no_comp_voice$total_resp <- no_comp_voice$n_verbal + no_comp_voice$n_non_verbal

no_comp_voice

prop_verbal <- no_comp_voice %>% 
  mutate(prop_test = purrr::map2(.x= n_verbal,
                                 .y= total_resp,
                                 .f= prop.test))


prop_verbal <- prop_verbal %>% 
  mutate(prop_tidy = purrr::map(prop_test, ~tidy(.x)))

prop_non_verbal <- no_comp_voice %>% 
  mutate(prop_test = purrr::map2(.x= n_non_verbal,
                                 .y= total_resp,
                                 .f= prop.test))


prop_non_verbal <- prop_non_verbal %>% 
  mutate(prop_tidy = purrr::map(prop_test, ~tidy(.x)))

prop_verbal <- prop_verbal%>% 
  unnest(prop_tidy)

prop_non_verbal <- prop_non_verbal%>% 
  unnest(prop_tidy)

prop_verbal <- prop_verbal %>% 
  select(-prop_test)

prop_non_verbal <- prop_non_verbal %>% 
  select(-prop_test)

prop_verbal <- prop_verbal %>% 
  select(pri_type, estimate, conf.low, conf.high, n_verbal) 

prop_non_verbal <- prop_non_verbal %>% 
  select(pri_type, estimate, conf.low, conf.high, n_non_verbal) 

prop_verbal <- prop_verbal %>% 
  mutate(estimate = as.numeric(estimate),
         conf.low = as.numeric(conf.low),
         conf.high = as.numeric(conf.high))

prop_non_verbal <- prop_non_verbal %>% 
  mutate(estimate = as.numeric(estimate),
         conf.low = as.numeric(conf.low),
         conf.high = as.numeric(conf.high))

prop_verbal <- prop_verbal %>% 
  arrange(desc(estimate))

prop_non_verbal <- prop_non_verbal %>% 
  arrange(desc(estimate))

prop_verbal$resp <- "Verbal"
prop_non_verbal$resp <- "Non-Verbal"

resp_per_int <- rbind(prop_verbal, prop_non_verbal)

resp_per_int[is.na(resp_per_int)] = 0

resp_per_int$n <- resp_per_int$n_non_verbal + resp_per_int$n_verbal

resp_per_int <- resp_per_int %>% 
  select(-n_non_verbal) %>% 
  select(-n_verbal)

resp_per_int$resp <- as.factor(resp_per_int$resp)

resp_per_int
```

```{r}
chart_3 <- resp_per_int %>%
  ungroup() %>% 
  mutate(pri_type = fct_reorder(pri_type, estimate)) %>% 
  ggplot(aes(label=conf.low, 
             label2=conf.high,
             label3=n))+
  geom_point(aes(x=estimate, y=pri_type, color=resp))+
  geom_errorbarh(aes(xmax = conf.high, 
                     xmin = conf.low, 
                     y = pri_type,
                     color=resp), height=0)+
  labs(title= "Proportions of Computer Response Type",
       y= "Person Interaction Type",
       x= "Percent of Responses",
       subtitle = "Bars show 95% confidence interval",
       color = "")+
  scale_x_continuous(labels = scales::percent)+
  theme(plot.title = element_text(hjust = -0.45, vjust=2.12))+
  theme_bw()

#chart_3 

ggplotly(chart_3) 
```


***

When we interact with voice-command technology, we use certain types of interactions to 'wake' the system ("Hey Siri..."), 'command' the system ("Play a song on Spotify"), 'question' the system ("What is the temperature for today?"), and many other types of interactions. 

These interaction types can exist in a chain, such as "Hey Siri, Play a song on Spotify". However, the primary type of interaction in this phrase is the command to have Siri play a song on Spotify. 

On the Starship Enterprise, the crew interacts with the Computer through different primary 'Interaction Types'. Definitions and examples of these interaction types can be found below. 

| Interaction Type | Definition                                                                                                        | Examples                                                |
|------------------|-------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| Command          | Utterances that directly tell the computer what to do.                                                            | Run a diagnostic on the port nacelle.                   |
| Question         | Utterances that ask the computer for something.                                                                   | Where is Captain Picard?                                |
| Statement        | Utterances tell don't tell the computer or ask it, but meaning is inferred.                                       | Deck four. I wish to learn about Earth.                 |
| Password         | Utterances that contain a password.                                                                               | This is Captain Picard.                                 |
| Wake Word        | Key phrases used to activate the computer.                                                                        | Computer. Holodeck.                                     |
| Comment          | Utterances that have no intended action for the computer.                                                         | Excellent. Ferrazene has a complex molecular structure. |
| Conversation     | Utterances that are more like human conversation, such as phatic espressions, formalities, and colloquial speech. | Well, check it again! Then run it for us, dear.         |

Because the Computer on the Starship Enterprise can generate objects and display information without responding, it is of interest to examine the proportion of occurrences when the computer responds verbally or non-verbally (which includes through actions only). 

The visualizations to the left shows the proportion of verbal versus non-verbal responses, according to interaction type by person. This information can help us understand what types of interactions are more likely to result in verbal or non-verbal responses from the Starship Enterprise Computer. 

From the data visualizations, we can see that Question and Password interactions are most likely to result in a Verbal response from the Computer. Wake Word and Conversation interactions had a low sample size, though all other types of interactions may result in either Verbal or Non-Verbal Computer response fairly equally.

Source: http://www.speechinteraction.org/TNG/TeaEarlGreyHotDatasetCodeBook.pdf

### Chart 4: Stacey

```{r}

```

### Chart 5: Courtney

```{r}
# Packages
library(wordcloud)
library(RColorBrewer)
library(wordcloud2)
library(tm)
library(tidyverse)

# Filter to necessary column
text <- startrek$interaction

# Clean text
docs <- Corpus(VectorSource(text))

docs <- docs %>%
  tm_map(removeNumbers) %>%
  tm_map(removePunctuation) %>%
  tm_map(stripWhitespace)

docs <- tm_map(docs, content_transformer(tolower))

docs <- tm_map(docs, removeWords, stopwords("english"))

# Create matrix with counts
dtm <- TermDocumentMatrix(docs)

matrix <- as.matrix(dtm) 

words <- sort(rowSums(matrix),decreasing = TRUE) 

df <- data.frame(word = names(words), freq = words)

# Wordcloud
wordcloud2(data = df, size = 3, color= "random-light", shape = "cardoid", backgroundColor = "black")

```